Report ACME Company

Potential Grabs

With many years of experiences in small domestic appliances and sustainable core values, ACME Company that compete with the major players on the European market ACME Companies provides the customers the unique user experiences, customer loyalty and reliable and high-quality services in the industry make us worth to be invested.

Problem Statement

In Germany, the sales of the household appliance market are expected to grow from 2021 to 2025 at average annual growth rate of around 1.5%. In 2021, sales are expected to reach EUR 9.963 million. The global market competition is highly fragmented and divided between international- and regional players. Many regional players are struggling to compete against them on a cost basis, nonetheless ACME Company. So, the company changes their business model to customer-centric approach. However, the company struggles to improve it to retain their customer base.

Event log data Analyis

ACME provide a solution to identify deviations in process performance. ACME extracted data from its ERP system.

Packages download and install

First off all we need to install some important packages

install.packages("tidyverse",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("readr",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("edeaR",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("dplyr",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("ggplot2",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("lubridate",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("bupaR",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("processmapR",repos="http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("FactoMineR",repos="https://CRAN.R-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("factoextra",repos="https://CRAN.R-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("kohonen",repos="https://CRAN.R-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
install.packages("cluster",repos="https://CRAN.R-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9f/74q5ctws39b7kcn4_836ldj40000gn/T//RtmpgmcTWm/downloaded_packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.1     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)
library(edeaR)
## 
## Attaching package: 'edeaR'
## The following object is masked from 'package:base':
## 
##     setdiff
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr)
library(ggplot2)
library(bupaR)
## 
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:utils':
## 
##     timestamp
library(FactoMineR)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(kohonen)
## 
## Attaching package: 'kohonen'
## The following object is masked from 'package:purrr':
## 
##     map
library(cluster)

Import data and Pre-processing

Loading Data

ACME_EventLog_data <- read_delim("EventLog_ACME_TeachingCase.csv", delim = ";", escape_double = FALSE, trim_ws = TRUE)
## Rows: 178078 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr  (4): CASE_ID, ACTIVITY, DEVICETYPE, SERVICEPOINT
## dbl  (1): REPAIR_IN_TIME_5D
## dttm (1): TIMESTAMP
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Structure and first look

str(ACME_EventLog_data)
## spec_tbl_df[,6] [178,078 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ CASE_ID          : chr [1:178078] "Case10" "Case10" "Case10" "Case10" ...
##  $ ACTIVITY         : chr [1:178078] "Creation" "Letter" "DeviceReceived" "StockEntry" ...
##  $ TIMESTAMP        : POSIXct[1:178078], format: "2018-01-02 13:39:47" "2018-01-05 00:00:00" ...
##  $ REPAIR_IN_TIME_5D: num [1:178078] 0 0 0 0 0 0 0 0 0 0 ...
##  $ DEVICETYPE       : chr [1:178078] "AB52" "AB52" "AB52" "AB52" ...
##  $ SERVICEPOINT     : chr [1:178078] "E" "E" "E" "E" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   CASE_ID = col_character(),
##   ..   ACTIVITY = col_character(),
##   ..   TIMESTAMP = col_datetime(format = ""),
##   ..   REPAIR_IN_TIME_5D = col_double(),
##   ..   DEVICETYPE = col_character(),
##   ..   SERVICEPOINT = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
ACME_EventLog_data %>% head(n = 10)
## # A tibble: 10 x 6
##    CASE_ID ACTIVITY TIMESTAMP           REPAIR_IN_TIME_… DEVICETYPE SERVICEPOINT
##    <chr>   <chr>    <dttm>                         <dbl> <chr>      <chr>       
##  1 Case10  Creation 2018-01-02 13:39:47                0 AB52       E           
##  2 Case10  Letter   2018-01-05 00:00:00                0 AB52       E           
##  3 Case10  DeviceR… 2018-01-05 16:45:34                0 AB52       E           
##  4 Case10  StockEn… 2018-01-17 00:00:00                0 AB52       E           
##  5 Case10  InDeliv… 2018-01-17 00:00:00                0 AB52       E           
##  6 Case10  NoteWor… 2018-01-17 07:37:19                0 AB52       E           
##  7 Case10  Complet… 2018-01-17 09:34:32                0 AB52       E           
##  8 Case100 Creation 2018-01-02 15:43:48                0 AB41       E           
##  9 Case100 NoteHot… 2018-01-02 15:44:41                0 AB41       E           
## 10 Case100 Letter   2018-01-08 00:00:00                0 AB41       E

Timestamp

ACME_EventLog_data$TIMESTAMP <- ymd_hms(ACME_EventLog_data$TIMESTAMP, tz = "CET")

Check if each row contains missing data

sum(is.na(ACME_EventLog_data$DEVICETYPE))
## [1] 20

Factors creation

ACME_EventLog_data$CASE_ID <- as.factor(ACME_EventLog_data$CASE_ID)
ACME_EventLog_data$ACTIVITY <- as.factor(ACME_EventLog_data$ACTIVITY)
ACME_EventLog_data$REPAIR_IN_TIME_5D <- as.factor(ACME_EventLog_data$REPAIR_IN_TIME_5D)
ACME_EventLog_data$DEVICETYPE <- as.factor(ACME_EventLog_data$DEVICETYPE)
ACME_EventLog_data$SERVICEPOINT <- as.factor(ACME_EventLog_data$SERVICEPOINT)
ACME_EventLog_data$Lifecycle <- as.factor("Start") # Extra Variable

Addtional Variable, Create a data frame to assign a Month/Year to each case based on the earliest record in the log

ACME_month <- ACME_EventLog_data %>% 
  group_by(CASE_ID) %>% 
  arrange(as.numeric(gsub("Case","", CASE_ID)),TIMESTAMP) %>% # sort rows in ascending order on "CASE_ID" and then TIMESTAMP value CASE_ID is string contained number so we use here small trick to sort it
  filter(row_number() == 1) %>% # filter just one per case
  mutate(Month = floor_date(TIMESTAMP, "month")) %>% # create new varable
  select(CASE_ID,Month) # select column "CASE_ID" and "Month"
  
head(ACME_month,n = 10)
## # A tibble: 10 x 2
## # Groups:   CASE_ID [10]
##    CASE_ID Month              
##    <fct>   <dttm>             
##  1 Case2   2018-01-01 00:00:00
##  2 Case3   2018-01-01 00:00:00
##  3 Case4   2018-01-01 00:00:00
##  4 Case5   2018-01-01 00:00:00
##  5 Case6   2018-01-01 00:00:00
##  6 Case8   2018-01-01 00:00:00
##  7 Case9   2018-01-01 00:00:00
##  8 Case10  2018-01-01 00:00:00
##  9 Case11  2018-01-01 00:00:00
## 10 Case13  2018-01-01 00:00:00

Add the Month/Year to the data

ACME_EventLog_data <- ACME_EventLog_data %>%
  left_join(ACME_month, by = "CASE_ID") %>%
  arrange(as.numeric(gsub("Case","", CASE_ID)),TIMESTAMP)

Building Event Log

ACME.log <- ACME_EventLog_data %>%
    mutate(activity_instance = 1:nrow(.)) %>% #We add a activity_instance column with random number to meet requirements 
    eventlog(
        case_id = "CASE_ID",
        activity_id = "ACTIVITY",
        activity_instance_id = "activity_instance",
        lifecycle_id = "Lifecycle",
        timestamp = "TIMESTAMP",
        resource_id = "DEVICETYPE"
    )

General Analyis

Summarize the event log

ACME.log %>% summary
## Number of events:  178078
## Number of cases:  23906
## Number of traces:  2720
## Number of distinct activities:  13
## Average trace length:  7.449092
## 
## Start eventlog:  2013-05-22 10:39:39
## End eventlog:  2019-06-28 08:39:30
##    CASE_ID                    ACTIVITY       TIMESTAMP                  
##  Length:178078      Creation      :23907   Min.   :2013-05-22 10:39:39  
##  Class :character   DeviceReceived:23906   1st Qu.:2018-06-11 09:41:52  
##  Mode  :character   InDelivery    :23906   Median :2018-10-31 10:17:36  
##                     Completed     :23890   Mean   :2018-10-16 14:42:48  
##                     NoteHotline   :17764   3rd Qu.:2019-02-23 10:12:57  
##                     Letter        :16453   Max.   :2019-06-28 08:39:30  
##                     (Other)       :48252                                
##  REPAIR_IN_TIME_5D   DEVICETYPE     SERVICEPOINT   Lifecycle     
##  0:120029          AB52   :26726   E      :63851   Start:178078  
##  1: 58049          AB49   :15740   L      :63549                 
##                    AB62   :13931   J      :16178                 
##                    AB61   :11999   G      :12832                 
##                    AB20   :10080   K      : 1837                 
##                    (Other):99582   (Other): 3313                 
##                    NA's   :   20   NA's   :16518                 
##      Month                     activity_instance      .order      
##  Min.   :2013-05-01 00:00:00   Length:178078      Min.   :     1  
##  1st Qu.:2018-05-01 00:00:00   Class :character   1st Qu.: 44520  
##  Median :2018-10-01 00:00:00   Mode  :character   Median : 89040  
##  Mean   :2018-09-18 20:25:50                      Mean   : 89040  
##  3rd Qu.:2019-02-01 00:00:00                      3rd Qu.:133559  
##  Max.   :2019-06-01 00:00:00                      Max.   :178078  
## 

Case Analysis

Case count over time

Case count per month

case_count_per_month <- ACME.log %>% 
  distinct(Month, CASE_ID) %>% 
  group_by(Month) %>% 
  summarise(Case_count =n()) %>%
  summarise(Month,Case_count, Anteil = Case_count/sum(Case_count)) %>%
  arrange(Case_count)

Total cases

case_count_per_month %>%
  summarise(Total_case = sum(Case_count))
## # A tibble: 1 x 1
##   Total_case
##        <int>
## 1      23906

Presenting it with bar chart

barplot(t(case_count_per_month[c('Case_count')]),
        names.arg=t(case_count_per_month[c('Month')]),
        las=1, 
        horiz=TRUE, #choose horizontal barchart
        xlab="Number of cases",
main="Case count over time")

Presenting it with line chart

case_count_per_month %>%
  ggplot(aes(x = as.Date(Month), y = Case_count)) +
  
  theme_minimal() +
  geom_line() +
  scale_x_date(date_labels = "%d-%m-%Y") +
  labs(title = "Number of cases over time",
       x = "Time",
       y = "Number")

As we can see here the number of cases went up over time and reached a peak in January 2019 and then declined

How many cases were done within 5 days? (Number of Repair in time Cases)

Repair_intime <- ACME.log %>% 
  distinct(CASE_ID,REPAIR_IN_TIME_5D,.keep_all = TRUE) %>% #Retain only unique value from an input
  count(REPAIR_IN_TIME_5D, name ="count_repair") %>% #Count total cases which are done within 5 days and not
  summarise(REPAIR_IN_TIME_5D,count_repair, Anteil = count_repair / sum(count_repair)) # put it together

The number of cases completed within 5 days was not impressed with just 34,65%

Number of activities per cases

Creating Activity_per_case table

  Activity_per_Case <- ACME.log %>%
  distinct(CASE_ID,ACTIVITY) %>% 
  count(CASE_ID,name = "Activity_per_case") %>% 
  count(Activity_per_case,name = "Count")

Presenting it with chart

Activity_per_Case %>%
  ggplot(aes(x = reorder(Activity_per_case,Count), y = Count)) +
  
  theme_minimal() +
  geom_col() +

  labs(title = "Most frequent Activities length per Case",
       x = "Activities length per Case",
       y = "Count")

A case contains in the most of time 6 to 9 activities

Frequency Analysis

Activity Frequency

Create Activity_Frequency table

  Activity_Frequency <- ACME.log %>%
  activity_frequency(level = "activity") #this function creates absolute and relative frequency of each activity
  Activity_Frequency 
## # A tibble: 13 x 3
##    ACTIVITY       absolute  relative
##    <fct>             <int>     <dbl>
##  1 Creation          23907 0.134    
##  2 DeviceReceived    23906 0.134    
##  3 InDelivery        23906 0.134    
##  4 Completed         23890 0.134    
##  5 NoteHotline       17764 0.0998   
##  6 Letter            16453 0.0924   
##  7 NoteWorkshop      16010 0.0899   
##  8 Approved           9708 0.0545   
##  9 Transmission       9266 0.0520   
## 10 StockEntry         8726 0.0490   
## 11 StatusRequest      4527 0.0254   
## 12 FreeticketComp       13 0.0000730
## 13 FreeticketCust        2 0.0000112

Here we can see that almost case will be completed from “creation” activity

Presenting with bar chart

Activity_Frequency %>%
  ggplot(aes(x = reorder(ACTIVITY,absolute), y = absolute)) + #sort bar
  
  theme_minimal() +
  geom_col() +
  coord_flip() + 
  labs(title = "Most frequent Activities",
       x = "Activity",
       y = "Count")

Resource frequency

Creating Resource_Frequency table

Resource_Frequency <- ACME.log %>%
  distinct(CASE_ID,.keep_all = TRUE) %>%
  filter_resource_frequency(perc = 1) %>% #this function creates absolute and relative frequency of resource
  resources()

Presenting with bar chart

Resource_Frequency %>%
  ggplot(aes(x = reorder(DEVICETYPE,absolute_frequency), y = absolute_frequency)) +
  
  theme_minimal() +
  geom_col() +
  scale_x_discrete(guide = guide_axis(angle = 90)) +
  labs(title = "Most frequent Resources",
       x = "Resources",
       y = "Count")

This chart delivers a number of devices which are required repair the most

Trace Analysis

Average trace length

ACME.log %>%
    filter_trace_length(percentage = 1) %>% #this function delivers distribution of trace length(Number of activity per trace)
    trace_length()
##       min        q1    median      mean        q3       max    st_dev       iqr 
##  4.000000  6.000000  7.000000  7.449092  8.000000 13.000000  1.472449  2.000000

Average number of activity in one trace is around 7-8 activities

Trace’s absolute and relative frequency, throughput time

trace_coverage <- ACME.log %>% #delivers absolute frequency value of each trace
  trace_coverage(level = "case") %>%
  distinct(trace,absolute)

trace_coverage <- trace_coverage %>%
  mutate(length = sapply(strsplit(trace_coverage$trace, ","), length)) #calculates trace's length

trace_throughput_time <- throughput_time(ACME.log, "trace") #calculates average throughput time of each trace

trace_explorer <- merge(trace_coverage,trace_throughput_time,by = "trace") #merger trace's frequency and throughput time

head(trace_explorer[,-c(1)],10)
##    absolute length relative_trace_frequency       min        q1      mean
## 1         1      8             0.0000418305 43.045417 43.045417 43.045417
## 2         1     10             0.0000418305 37.128981 37.128981 37.128981
## 3         1      8             0.0000418305  6.935891  6.935891  6.935891
## 4         1      8             0.0000418305  5.843542  5.843542  5.843542
## 5         1      8             0.0000418305 47.909236 47.909236 47.909236
## 6         1      7             0.0000418305 52.520150 52.520150 52.520150
## 7         1      6             0.0000418305 27.929421 27.929421 27.929421
## 8        22      6             0.0009202711  8.847407 10.729068 18.830881
## 9         1      7             0.0000418305 16.675417 16.675417 16.675417
## 10        9      7             0.0003764745  2.868090 10.044618 13.300734
##       median        q3       max   st_dev       iqr      total
## 1  43.045417 43.045417 43.045417       NA  0.000000  43.045417
## 2  37.128981 37.128981 37.128981       NA  0.000000  37.128981
## 3   6.935891  6.935891  6.935891       NA  0.000000   6.935891
## 4   5.843542  5.843542  5.843542       NA  0.000000   5.843542
## 5  47.909236 47.909236 47.909236       NA  0.000000  47.909236
## 6  52.520150 52.520150 52.520150       NA  0.000000  52.520150
## 7  27.929421 27.929421 27.929421       NA  0.000000  27.929421
## 8  18.099867 23.147127 36.718900 8.847696 12.418058 414.279387
## 9  16.675417 16.675417 16.675417       NA  0.000000  16.675417
## 10 11.142847 19.081192 27.141308 7.945907  9.036574 119.706609

Presenting trace coverage with bar chart

trace_coverage %>%
  mutate(Trace_Variant = 1:nrow(.)) %>%
  head(trace_coverage,n = 80) %>% #we present the first 80 traces with the most coverage grade
  
  ggplot(aes(x = reorder(Trace_Variant,absolute), y = absolute)) + #order value ascending
  theme_minimal() +
  geom_col() +
  scale_x_discrete(guide = guide_axis(angle = 90)) +
  labs(title = "Most frequent trace",
       x = "Traces",
       y = "Count")

80 traces cover almost all of traces

Throughput Time Analysis

What about average case’s and month’s throughput over time?

throughput_per_case <- throughput_time(ACME.log, "case") #delivers the throughput time of each case

throughput_per_month <- ACME.log %>% #delivers average throughput time of all cases from months
  distinct(Month, CASE_ID) %>%
  left_join(throughput_per_case, by = "CASE_ID") %>%
  group_by(Month) %>%
  summarize(Throughput.Avg = mean(throughput_time)) %>%
  arrange(Month)

Average_throughput_per_case <- mean(throughput_per_case[,-c(1)]) # Average throughput time of input data

Case throughput distribution

summary(throughput_per_case$throughput_time) # delivers distribution of throughput time
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.3474    7.9895   15.0930   20.1811   24.9501 1803.9291
quantile(throughput_per_case$throughput_time, probs = seq(0.1, 1, 0.1)) # delivers quantile with 10% distance each
##         10%         20%         30%         40%         50%         60% 
##    4.040179    6.963762    9.161169   12.794653   15.093003   18.910567 
##         70%         80%         90%        100% 
##   22.106551   27.984931   38.841794 1803.929132

Presenting it with line chart

ggplot(throughput_per_month, aes(x = Month, y = Throughput.Avg)) +
  theme_bw() +
  geom_line(size = 0.75) +
  labs(x = "Month & Year", y = "Average Case Throughput in Days",
       title = "Case Throughput Averages for Complete Log File")

head(throughput_per_month, nrow(throughput_per_month))
## # A tibble: 19 x 2
##    Month               Throughput.Avg
##    <dttm>                       <dbl>
##  1 2013-05-01 00:00:00        1793.  
##  2 2018-01-01 00:00:00          16.4 
##  3 2018-02-01 00:00:00          17.5 
##  4 2018-03-01 00:00:00          20.7 
##  5 2018-04-01 00:00:00          19.3 
##  6 2018-05-01 00:00:00          21.6 
##  7 2018-06-01 00:00:00          19.1 
##  8 2018-07-01 00:00:00          25.3 
##  9 2018-08-01 00:00:00          24.9 
## 10 2018-09-01 00:00:00          21.9 
## 11 2018-10-01 00:00:00          19.5 
## 12 2018-11-01 00:00:00          18.5 
## 13 2018-12-01 00:00:00          22.6 
## 14 2019-01-01 00:00:00          20.5 
## 15 2019-02-01 00:00:00          20.4 
## 16 2019-03-01 00:00:00          19.4 
## 17 2019-04-01 00:00:00          19.6 
## 18 2019-05-01 00:00:00          16.7 
## 19 2019-06-01 00:00:00           9.86

Advanced Analysis

Create activity Profile based on Eventlog data

activity_profile <- as.data.frame.matrix(table(ACME.log[c(1,2)])) # Activities occur in trace are noted with "1"
head(activity_profile,10)
##           Approved Completed Creation DeviceReceived FreeticketComp
## Case10           0         1        1              1              0
## Case100          0         1        1              1              0
## Case1000         0         1        1              1              0
## Case10002        0         1        1              1              0
## Case10003        0         1        1              1              0
## Case10006        1         1        1              1              0
## Case10007        0         1        1              1              0
## Case10008        0         1        1              1              0
## Case10009        0         1        1              1              0
## Case1001         1         1        1              1              0
##           FreeticketCust InDelivery Letter NoteHotline NoteWorkshop
## Case10                 0          1      1           0            1
## Case100                0          1      1           1            1
## Case1000               0          1      1           0            1
## Case10002              0          1      0           1            1
## Case10003              0          1      1           1            1
## Case10006              0          1      1           1            0
## Case10007              0          1      0           1            1
## Case10008              0          1      0           1            1
## Case10009              0          1      1           1            1
## Case1001               0          1      1           1            0
##           StatusRequest StockEntry Transmission
## Case10                0          1            0
## Case100               1          1            0
## Case1000              0          0            0
## Case10002             0          0            0
## Case10003             0          1            0
## Case10006             0          0            1
## Case10007             0          0            0
## Case10008             0          1            0
## Case10009             0          0            0
## Case1001              0          0            1

Happy Path discover

Understanding’s happy path

happy_path = head(trace_explorer[order(-trace_explorer$absolute),],1) # the trace with the most frequency will be happy path

print(paste("Haypy Path:",happy_path$trace))
## [1] "Haypy Path: Creation,DeviceReceived,Letter,Transmission,Approved,NoteHotline,InDelivery,Completed"
print(paste("Happy Path's absolute frequency:",happy_path$absolute, "time"))
## [1] "Happy Path's absolute frequency: 964 time"
print(paste("Happy Path average throughput time:",round(happy_path$mean), "day"))
## [1] "Happy Path average throughput time: 24 day"

Happy path visualization

ACME.log %>%
  group_by_case() %>%
  trace_explorer(coverage = 0.20, coverage_labels = c("absolute","relative", "cumulative")) #Coverage grade: 20%
## Warning: `rename_()` was deprecated in dplyr 0.7.0.
## Please use `rename()` instead.

Time period

#Delivers time frame where cases start
ACME.log %>%
    filter_time_period(interval = ymd(c(20180101, 20181231)), filter_method = "start") %>%
    dotted_chart
## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## Warning: The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0.
## Please use the `.add` argument instead.
## Joining, by = "CASE_ID"
## Warning: It is deprecated to specify `guide = FALSE` to remove a guide. Please
## use `guide = "none"` instead.

Presenting process with process map

ACME.log %>%
  filter_activity_frequency(percentage = 1.0) %>% 
  filter_trace_frequency(percentage = .5) %>%
  filter_endpoints(start_activities = "Creation", end_activities = "Completed") %>%
  
  process_map(type = frequency("absolute"))

Performance Profile

Monthly Performance Profile

monthly_performance_profile <- merge(case_count_per_month,throughput_per_month,by = "Month")
head(monthly_performance_profile,10)
##         Month Case_count       Anteil Throughput.Avg
## 1  2013-05-01          3 0.0001254915     1793.37977
## 2  2018-01-01       1373 0.0574332803       16.43705
## 3  2018-02-01       1174 0.0491090103       17.50974
## 4  2018-03-01       1286 0.0537940266       20.71593
## 5  2018-04-01       1127 0.0471429767       19.28143
## 6  2018-05-01       1222 0.0511168744       21.61522
## 7  2018-06-01       1259 0.0526646030       19.08236
## 8  2018-07-01       1310 0.0547979587       25.32606
## 9  2018-08-01       1298 0.0542959926       24.88538
## 10 2018-09-01       1279 0.0535012131       21.85503

Presenting process with performance process map

ACME.log %>%
  filter_activity_frequency(percentage = 1.0) %>% 
  filter_trace_frequency(percentage = .5) %>%
  filter_endpoints(start_activities = "Creation", end_activities = "Completed") %>%
  
  process_map(performance(median, "days"))

Clustering Analysis

means <- apply(activity_profile,2 ,mean) # Mean calculate

sds <- apply(activity_profile,2,sd) # standard deviation Calculate

nor <- scale(activity_profile,center=means,scale=sds+1) # normalization (standard deviation in some columns are equal 0 so we plus 1 to avoid NaN value, result could be light affected)

nor1 <-as.data.frame(nor)

distance = dist(nor,method = "euclidean") #Euclidean distance

Cluster with self organizing map

set.seed(123)
g <- somgrid(xdim = 4, ydim = 4, topo = "rectangular" )
map <- som(nor,
           grid = g,
           alpha = c(0.05, 0.01),
           radius = 1)
plot(map,type = "codes")

## Hierarchical agglomerative clustering

mydata.hclust = hclust(distance,method ="average")
plot(mydata.hclust)

Scree plot

wss <- (nrow(activity_profile)-1)*sum(apply(activity_profile,2,var))
for (i in 1:20) wss[i] <- sum(kmeans(activity_profile, centers=i)$withinss)
plot(1:20, wss, type="b", xlab="Number of Clusters", ylab="Number of cases") 

K-means clustering

set.seed(123)
kc<-kmeans(nor,10)

Visualization

fviz_cluster(kc, data =nor[,-c(4,7)]) #we have to apply kmean clustering on the dataset without column "deviceReceived" and "InDelivery" because after data normalization there are only NaN/0 value in these two columns which prevent the visualization. The reason is that these activites occur in almost cases so we can't relly calculate its mean and its standard deviation is almost equal 0

Proposed Solution

ACME struggles with their customer services because it takes a lot of time to complete one customer’s requirement. In order to offer achieve the goals , ACME needs to reduce overall throughput time and their model complexity. We suggest that ACME should consider reduce the throughput time between following activities: Device-received, Letter, and Transmission.